TextPipe: Online Help
    Optimizing Performance
 

Submit feedback on this topic 

 Home  User Assistance   Tutorials   How to Use TextPipe
 Menus: File   Edit   Filters[ Convert   Add   Remove   Unicode   Replace   Special   Map   Email   Restrict ]  Tools   Window   Help   Advanced
Home
Up

 

 

Disk I/O

The slowest operation that TextPipe performs is reading from and writing to disk. You can improve performance by making sure that all files being processed are stored on local disks rather than on network servers. You can also increase speed by an order of magnitude by using RAM drives - a disk held in memory, although naturally this won't help if the files you process are very large.

TextPipe utilizes specific Windows API calls to enhance the speed of reading data files.

Temporary files

TextPipe doesn't use temporary files at all, except for sorting where they are unavoidable. TextPipe only ever writes out the completed output file so far. It uses a file name like TXPxxx.tmp until the file is completely written out, then it renames it to the actual output filename.

If you have enough memory, the entire sort is performed in memory for speed. Every 10000 lines, TextPipe checks to see if there is less than 16MB of physical memory (not virtual memory!). If so, it writes the sorted results so far to a temporary file and then continues. If the Output Filter is set to File Output or Single File Output, any temporary files are written to the same folder as the output file. If the Output Filter is set to Clipboard Output then any temporary files are written to the current folder. All temporary files are removed as soon as possible during the sort progresses.

Pattern matching

You can improve matching performance by an order of magnitude by allowing patterns to fail earlier by limiting what wildcards like .* can match. If you can replace .* (match any character 0 or more times) with [^>]* (match any character except '>' 0 or more times) or [^>]{0,200} (match any character except '>' up to 200 times) then your patterns will match/fail to match far more quickly.

Greedy matching is almost always slower than non-greedy matching, and if the maximum allowed match size is very large then performance will slow to a crawl.

You should avoid patterns of the type

([^">])+

where the brackets are unnecessary. Each set of brackets causes a recursive step. This pattern should be written as:

[^">]+

When the match can occur a large number of times, it can often be rewritten as

[^">]{1,10}     or     ([^">]){1,10}

with no change to the pattern matching process. This avoids too many recursive steps.

Use Your Head!

See how many problems you can spot with this filter:

|--Restrict to each line in turn
| |
| |--Convert End of Lines - Auto to DOS
| | [X] Remove bad EOL
| | 
| |--Replace [|] with []
| | [ ] Match case
| | [ ] Whole words only
| | [ ] Case sensitive replace
| | [ ] Prompt on replace
| | [ ] Skip prompt if identical
| | [ ] First only
| | [ ] Extract matches
| | 
| |--Replace [)] with []
| | [ ] Match case
| | [ ] Whole words only
| | [ ] Case sensitive replace
| | [ ] Prompt on replace
| | [ ] Skip prompt if identical
| | [ ] First only
| | [ ] Extract matches
| | 
| +--Replace [)] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| 
|--Restrict to each line in turn
| |
| +--Insert column 0 [@77777777]
| 
|--Restrict to each line in turn
| |
| +--Insert column 0 [@88888]
| 
|--Restrict to each line in turn
| |
| +--Insert column 0 [@99999]
| 
|--Replace [@] with [|]
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
<-- This restriction filter is unnecessary, all the filters inside it already effectively restricted to each line




1. Since the replacement text for Replacement #1, #2 and #3 is the same (blank), we could combine all three and use a 
perl pattern searching for:  [()|]    OR
an EasyPattern searching for:  [ '(' or ')' or '|' ]






2. 








3.








<-- This restriction filter is unnecessary, the insert column filter does not need it

1. Could be combined with #2 and #3 below, to insert all columns at the same time

<-- This restriction filter is unnecessary, the insert column filter does not need it

2. See #1

<-- This restriction filter is unnecessary, the insert column filter does not need it

3. See #1

<-- Could have been done in #1, #2 and #3 above, removing the need for a search/replace filter.


Scripting

Use of the scripting filter will slow TextPipe down because of the extra COM overhead of passing data to and from the script, and the script code being interpreted rather than compiled. Ensure that the code you write is as efficient as possible. You may be able to avoid the use of a script by judicious use of patterns, but this may be more complicated to understand. Use comments liberally - they do not slow processing down at all.

TextPipe Engine

If performance is key, you can take advantage of our embeddable TextPipe Engine. While the filter setup is done via scripting, the actual filtering is done with no GUI feedback, and no error logging for maximum performance.

Professional Services

Our consulting team offer filter optimization as a service. Please see our website for details.

 

 

 Contact Us   Support   Community   Tutorials and User Guides (online)
 © 1999-2005 Crystal Software. All rights reserved.